25 research outputs found

    Industrial internet of things: What does it mean for the bioprocess industries?

    Get PDF
    Industrial Internet of Things (IIoT) is a system of interconnected devices that, via the use of various technologies, such as soft sensors, cloud computing, data analytics, machine learning and artificial intelligence, provides real-time insight into the operations of any industrial process from product conceptualisation, process optimisation and manufacturing to the supply chain. IIoT enables wide-scope data collection and utilisation, and reduces errors, increases efficiency, and provides an improved understanding of the process in return. While this novel solution is the pillar of Industry 4.0, the inherent operational complexity of bioprocessing arising from the involvement of living systems or their components in manufacturing renders the sector a challenging one for the implementation of IIoT. A large segment of the industry comprises the manufacturing of biopharmaceuticals and advanced therapies, some of the most valuable biotechnological products available, which undergo tight regulatory evaluations and scrutinization from product conceptualisation to patient delivery. Extensive process understanding is what biopharmaceutical industry strives for, however, the complexity of transition into a new mode of operation, potential misalignment of priorities, the need for substantial investments to facilitate transition, the limitations imposed by the downtime required for transition and the essentiality of regulatory support, render it challenging for the industry to adopt IIoT solutions to integrate with biomanufacturing operations. There is currently a need for universal solutions that would streamline the implementation of IIoT and overcome the widespread reluctance observed in the sector, which will recommend accessible implementation strategies, effective employee training and offer valuable insights in return to advance any processing and manufacturing operation within their respective regulatory frameworks

    Improving functional annotation for industrial microbes: a case study with Pichia pastoris.

    Get PDF
    The research communities studying microbial model organisms, such as Escherichia coli or Saccharomyces cerevisiae, are well served by model organism databases that have extensive functional annotation. However, this is not true of many industrial microbes that are used widely in biotechnology. In this Opinion piece, we use Pichia (Komagataella) pastoris to illustrate the limitations of the available annotation. We consider the resources that can be implemented in the short term both to improve Gene Ontology (GO) annotation coverage based on annotation transfer, and to establish curation pipelines for the literature corpus of this organism.We gratefully acknowledge funding from the Wellcome Trust (PomBase and Canto; WT090548MA to SGO), and the EU 7th Framework Programme (BIOLEDGE Contract No: 289126 to SGO).This is the published version distributed under a Creative Commons Attribution License 2.0, which can also be found on the publisher's website at: http://www.sciencedirect.com/science/article/pii/S0167779914001061

    CamOptimus: a tool for exploiting complex adaptive evolution to optimize experiments and processes in biotechnology

    Get PDF
    Multiple interacting factors affect the performance of engineered biological systems in synthetic biology projects. The complexity of these biological systems means that experimental design should often be treated as a multiparametric optimization problem. However, the available methodologies are either impractical, due to a combinatorial explosion in the number of experiments to be performed, or are inaccessible to most experimentalists due to the lack of publicly available, user-friendly software. Although evolutionary algorithms may be employed as alternative approaches to optimize experimental design, the lack of simple-to-use software again restricts their use to specialist practitioners. In addition, the lack of subsidiary approaches to further investigate critical factors and their interactions prevents the full analysis and exploitation of the biotechnological system. We have addressed these problems and, here, provide a simple-to-use and freely available graphical user interface to empower a broad range of experimental biologists to employ complex evolutionary algorithms to optimize their experimental designs. Our approach exploits a Genetic Algorithm to discover the subspace containing the optimal combination of parameters, and Symbolic Regression to construct a model to evaluate the sensitivity of the experiment to each parameter under investigation. We demonstrate the utility of this method using an example in which the culture conditions for the microbial production of a bioactive human protein are optimized. CamOptimus is available through: (https://doi.org/10.17863/CAM.10257).EU 7th Framework Programme (BIOLEDGE Contract No: 289126 to S. G. O and J. R), BBSRC (BRIC2.2 to S. G. O. and N. K. H. S.), Synthetic Biology Research Initiative Cambridge (SynBioFund to D. D., A. C. C. and J. M. L. D.

    UNCLES: Method for the identification of genes differentially consistently co-expressed in a specific subset of datasets

    Get PDF
    Background: Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Results: Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. Conclusions: The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.The National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number RP-PG-0310-1004)

    Inclusion of maintenance energy improves the intracellular flux predictions of CHO

    Get PDF
    Chinese hamster ovary (CHO) cells are the leading platform for the production of biopharmaceuticals with human-like glycosylation. The standard practice for cell line generation relies on trial and error approaches such as adaptive evolution and high-throughput screening, which typically take several months. Metabolic modeling could aid in designing better producer cell lines and thus shorten development times. The genome-scale metabolic model (GSMM) of CHO can accurately predict growth rates. However, in order to predict rational engineering strategies it also needs to accurately predict intracellular fluxes. In this work we evaluated the agreement between the fluxes predicted by parsimonious flux balance analysis (pFBA) using the CHO GSMM and a wide range of 13C metabolic flux data from literature. While glycolytic fluxes were predicted relatively well, the fluxes of tricarboxylic acid (TCA) cycle were vastly underestimated due to too low energy demand. Inclusion of computationally estimated maintenance energy significantly improved the overall accuracy of intracellular flux predictions. Maintenance energy was therefore determined experimentally by running continuous cultures at different growth rates and evaluating their respective energy consumption. The experimentally and computationally determined maintenance energy were in good agreement. Additionally, we compared alternative objective functions (minimization of uptake rates of seven nonessential metabolites) to the biomass objective. While the predictions of the uptake rates were quite inaccurate for most objectives, the predictions of the intracellular fluxes were comparable to the biomass objective function.COMET center acib: Next Generation Bioproduction, which is funded by BMK, BMDW, SFG, Standortagentur Tirol, Government of Lower Austria and Vienna Business Agency in the framework of COMET - Competence Centers for Excellent Technologies. The COMET-Funding Program is managed by the Austrian Research Promotion Agency FFG; D.S., J.S., M.W., M.H., D. E.R. This work has also been supported by the PhD program BioToP of the Austrian Science Fund (FWF Project W1224)info:eu-repo/semantics/publishedVersio

    Metaheuristic approaches in biopharmaceutical process development data analysis

    Get PDF
    There is a growing interest in mining and handling of big data, which has been rapidly accumulating in the repositories of bioprocess industries. Biopharmaceutical industries are no exception; the implementation of advanced process control strategies based on multivariate monitoring techniques in biopharmaceutical production gave rise to the generation of large amounts of data. Real-time measurements of critical quality and performance attributes collected during production can be highly useful to understand and model biopharmaceutical processes. Data mining can facilitate the extraction of meaningful relationships pertaining to these bioprocesses, and predict the performance of future cultures. This review evaluates the suitability of various metaheuristic methods available for data pre-processing, which would involve the handling of missing data, the visualisation of the data, and dimension reduction; and for data processing, which would focus on modelling of the data and the optimisation of these models in the context of biopharmaceutical process development. The advantages and the associated challenges of employing different methodologies in pre-processing and processing of the data are discussed. In light of these evaluations, a summary guideline is proposed for handling and analysis of the data generated in biopharmaceutical process development

    Data intelligence for process performance prediction in biologics manufacturing

    Get PDF
    Despite the availability of large amount of data in bioprocess databases, little has been done for its retrospective analysis for process improvement. Historic bioprocess data is multivariate time-series, and due to its inherent nature, is incompatible with a variety of statistical methods employed in data analysis resulting in the lack of a tailored methodology. We present here an integrative framework of knowledge discovery tailored for handling historical bioprocess datasets. The pipeline successfully predicts process performance at harvest from an early time point, and robustly identifies the most relevant process parameters to model process performance. We present the utility of this pipeline on biologics manufacturing data from upstream bioprocess development for antibody production by mammalian cells. The proposed multi-model system that employs machine learning can predict performance at harvest after two weeks of operation with satisfactory accuracy employing data generated as early as on the sixth day of the culture.Medimmune (AstraZeneca

    A heuristic approach to handling missing data in biologics manufacturing databases

    Get PDF
    The biologics sector has amassed a wealth of data in the past three decades, in line with the bioprocess development and manufacturing guidelines, and analysis of these data with precision is expected to reveal behavioural patterns in cell populations that can be used for making predictions on how future culture processes might behave. The historical bioprocessing data likely comprise experiments conducted using different cell lines, to produce different products and may be years apart; the situation causing inter-batch variability and missing data points to human- and instrument-associated technical oversights. These unavoidable complications necessitate the introduction of a pre-processing step prior to data mining. This study investigated the efficiency of mean imputation and multivariate regression for filling in the missing information in historical bio-manufacturing datasets, and evaluated their performance by symbolic regression models and Bayesian non-parametric models in subsequent data processing. Mean substitution was shown to be a simple and efficient imputation method for relatively smooth, non-dynamical datasets, and regression imputation was effective whilst maintaining the existing standard deviation and shape of the distribution in dynamical datasets with less than 30% missing data. The nature of the missing information, whether Missing Completely At Random, Missing At Random or Missing Not At Random, emerged as the key feature for selecting the imputation method
    corecore